The garbage collector in CLR 4.0 has undergone some
really exciting changes/additions primarily in the area of exposing
additional diagnostics and the addition of a new garbage collection mode
called background garbage collection.
Extended Diagnostics
The SOS debugger commands to gain insight into how the GC
works and what information we can use to troubleshoot difficult
application problems. With CLR 4.0, the SOS debugger extension has been
extended to include a new set of commands that further aid in
troubleshooting application problems related to the GC. In this section,
we will take a look at these new commands and how they can be used.
VerifyObj
The first command of interest is the VerifyObj command, which has the following syntax:
!VerifyObj <object address>
The
command takes an object address and checks the object for possible
signs of corruption. The algorithm used to detect corruption is
primarily in the area of making sure that the method table is intact
both with the actual object and any contained objects. If you suspect
that a heap corruption is rearing its head, the output of this command
can serve as a quick indicator. Here is an example of a corrupt object
and the output of the VerifyObj command:
0:000>!VerifyObj 0x02126804
object 0x2126804 does not have valid method table
FindRoots
Finding the reason why
an object has not yet been collected can be a tedious process. Objects
that have “simple” roots are relatively straightforward, but at times,
an object’s root can be less than straightforward to spot. For example,
if an object has a cross-generational reference to it and the
referencing generation has not yet been collected, the object will still
appear to be live and well. To make life easier when detecting these
cross-generational references, the FindRoots command can be used:
!FindRoots -gen <N> | -gen any | <object address>
The FindRoots
command instructs the runtime to set a breakpoint the next time a
garbage collection occurs in the specified generation (using the gen <N> switch) or anytime a garbage collection occurs regardless of the generation (using the gen any switch). After the breakpoint hits, the FindRoots command can be fed an object’s address to display the roots for the object.
The first step in the process is typically finding the generation that the object belongs to by using the GCWhere command:
0:003> !GCWhere 00b08580
Address Gen Heap segment begin allocated size
00b0b400 0 0 00b00000 00b01000 00b0c010 0xc(12)
The output shows that the object at address 0x00b08580 belongs to generation 0.
Next, we use the FindRoots command to break on the next garbage collection that occurs in the generation and resume execution:
0:003> !FindRoots -gen any
0:003> g
(710.970): CLR notification exception - code e0444143 (first chance)
CLR notification: GC - Performing a gen 2 collection. Determined surviving
objects...
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=0013f118 ebx=00000000 ecx=00000000 edx=00000006 esi=0013f1dc edi=00000003
eip=7c812afb esp=0013f114 ebp=0013f168 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
KERNEL32!RaiseException+0x53:
7c812afb 5e pop esi
After the breakpoint is hit, we can use the FindRoots command with the object’s address to find out the roots of the object:
0:000> !FindRoots 00b0b400
Scan Thread 0 OSTHread 970
ESP:13fac8:Root: 01b01010(System.Object[])->
00b0ab38(System.Collections.Hashtable)->
00b0ab70(System.Collections.Hashtable+bucket[])->
00b0b400(System.Int32)
Scan Thread 2 OSTHread acc
DOMAIN(0016CB98):HANDLE(Pinned):9713fc:Root: 01b01010(System.Object[])->
00b0ab38(System.Collections.Hashtable)->
00b0ab70(System.Collections.Hashtable+bucket[])->
00b0b400(System.Int32)
HeapStat
The HeapStat
command shows a nice and detailed breakdown of the used and free bytes
for each generation on each managed heap. Additionally, it provides a
summary view showing free versus used memory (as a percentage) on the
small object heap (SOH) and large object heap (LOH). The syntax of the
command is
!HeapStat [-inclUnrooted | -iu]
The default output shows all rooted objects. The inclUnrooted (or iu shortcut) switch can be used to include all rooted as well as unrooted objects. Here is an example of running the HeapStat command on the 05OOM.exe application:
0:004> !HeapStat
Heap Gen0 Gen1 Gen2 LOH
Heap0 2166844 134200 159064 33328
Free space Percentage
Heap0 1865804 12 36 96 SOH: 75% LOH: 0%
The
output shows both how much memory is in use: 2.1MB in Gen 0, 134KB in
Gen 1, 159KB in Gen 2, and finally 33KB in the LOH. The free space
indicates that there is 1.8MB available in Gen 0, and 12 and 36 in Gen 1
and Gen 2, respectively.
GCWhere
The process involved dumping out the managed
heap segments (using the eeheap
command) and then matching the address of the object to one of the
segments listed in the output (output specifies which generation each
segment corresponds to). This process may work fine if you’re only
trying to find the generation of one or two objects, but any more than
that and it becomes rather tedious. Fortunately, SOS 4.0 introduces a
command called GCWhere that displays information about the object passed in as an argument. The syntax of the command is
!GCWhere <object address>
Here is an example of the output when ran against a FileStream object:
0:000> !GCWhere 0x01efd3f8
Address Gen Heap segment begin allocated size
01efd3f8 0 0 01ea0000 01ea1000 020f99cc 0x50(80)
The output shows the address of the object in question (0x01efd3f8), the generation to which the object belongs (Gen 0), the managed heap (0), the segment pointer (0x01ea0000), the starting address of the segment (0x01ea1000), number of bytes allocated on the segment (0x020f99c), and finally the size of the object (0x50). Please note that the size is not the recursive size (i.e., does not include the size of child objects).
ListNearObj
The ListNearObj
command can be used to validate the consistency of the heap. The
command takes an object address as an argument and attempts to validate
both the object before and after the specified object. The syntax of the
command is shown here:
!ListNearObj <object address>
For example, running the ListNearObj against an object that is valid and is surrounded by valid objects yields the following output:
0:000> !ListNearObj 0x01efd3f8
Before: 01efd3c8 48 (0x30) System.Collections.Hashtable+bucket[]
Current: 01efd3f8 80 (0x50) System.IO.FileStream
After: 01efd448 28 (0x1c) System.String
Heap local consistency confirmed.
The output is broken down into the before, current, and after
followed by the result of the validation. The before, current, and
after sections specify the object’s address, size, and type. In the
preceding example, all three objects were considered valid and therefore
the command considers the heap local consistency to be intact. If, on
the other hand, we run the command against an object that is corrupted
(where the size of the object has been overwritten), we see the
following output:
0:000> !ListNearObj 0x01efd3f8
Before: 01efd3c8 48 (0x30) System.Collections.Hashtable+bucket[]
After: 01efd448 28 (0x1c) System.String
Heap local consistency not confirmed.
AnalyzeOOM
SOS 4.0 introduces a new command
called AnalyzeOOM that helps in the out-of-memory diagnosis process. The syntax for the command is shown here:
Let’s use a small application called 10OOM.exe to illustrate how the command can be used. The 10OOM.exe
application simply sits in a tight loop and allocates large amounts of
memory until the memory is exhausted. Run the application under the
debugger until the out-of-memory exception is thrown:
(2b14.281c): C++ EH exception - code e06d7363 (first chance)
(2b14.281c): CLR exception - code e0434352 (first chance)
ModLoad: 75370000 75378000 C:\Windows\system32\VERSION.dll
Unhandled Exception: OutOfMemoryException.
(2b14.281c): CLR exception - code e0434352 (!!! second chance !!!)
eax=0030edc0 ebx=00000005 ecx=00000005 edx=00000000 esi=0030ee6c edi=003fa160
eip=75e242eb esp=0030edc0 ebp=0030ee10 iopl=0 nv up ei pl nz ac pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000216
KERNEL32!RaiseException+0x58:
75e242eb c9 leave
Next, we execute the AnalyzeOOM command:
0:000> !AnalyzeOOM
Managed OOM occured after GC #247 (Requested to allocate 1048576 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
The output
tells us that an out-of-memory condition occurred after the 247th
garbage collection and that the requested amount of memory was 1048576.
It also gives us the reason behind the condition (low on memory during
garbage collection). Lastly, the details section tells us that it failed
to reserve 16777216 bytes of memory, which corresponds to the smallest segment size on the small object heap.
In our example, it’s
quite clear why the out-of-memory condition occurred, but there may be
other reasons such as the CLR attempting to allocate internal data
structures or other components throwing out-of-memory exceptions.
Background Garbage Collection
Prior to CLR 4.0, the
garbage collector could work in two different modes. The first mode is
known as the workstation mode (or concurrent GC) and targets
applications running on workstations such as UI applications. The second
mode, known as server mode (or blocking GC), targets server-side
applications that typically do not require any UI. The reason behind
having two modes lies primarily in the response time while a garbage
collection occurs. After a garbage collection is underway, the execution engine and
associated managed threads must be periodically suspended to avoid
triggering another garbage collection. This suspension of managed
threads can obviously create a short pause that can manifest itself to
users of the application. In the case of workstation type of
applications with a UI, this can result in the UI flashing or other
subtle nuances such as lag times between a user click and the action
associated with the click. In these cases, it is crucial that the amount
of time that threads stay in a suspended mode be as small as possible.
The concurrent (or workstation) GC accomplishes this by only suspending
all the managed threads twice during a GC rather than throughout the
entire duration, as is the case with the server GC. During the time that
the managed threads are not suspended, they are allowed to keep
allocating memory up until the end of the ephemeral segment. If the
ephemeral segment is exhausted while a concurrent GC is underway, the
managed threads are suspended until the concurrent GC completes (turning
the concurrent GC into a blocking GC). In essence, this means that as
long as the ephemeral segment is not exhausted, lag times can be
avoided.
On the other hand, the
server GC doesn’t have to worry about immediate response times as much
as the workstation GC, because most server applications don’t have the
need for immediate response times (such as in the case of UI
applications). Instead of allowing allocations to occur while the GC is
working, the server mode GC keeps all managed threads suspended
throughout the duration of a GC. Although this may result in a
nonvisible lag time while a GC is underway, the benefit of a server GC
is higher throughput primarily due to not having to worry about other
managed threads working at the same time. Additionally, in a server GC,
each processor has a dedicated GC thread which, in turn, means that we
can have X GCs happen at the same time (where X is number of processors*
number of cores).
One of the primary
drawbacks with the existing concurrent GC is that it works really well
with applications that have relatively small-sized managed heaps
(remember that as long as the ephemeral segment is not exhausted,
managed threads can keep allocating memory). In today’s world, it is not
uncommon to have applications whose managed heaps are in the gigabytes.
In these cases, lag times can still be experienced since the concurrent
GC becomes a blocking GC when the ephemeral segment limit has been
reached. To address this deficiency, CLR 4.0 replaces the concurrent GC
with what is known as the background GC. The biggest difference between a
concurrent GC and a background GC is that the background GC allows for a
full GC and allocations to happen at the same time as well as allowing a
collection of generation 0 and 1. The background GC periodically checks
to see if a concurrent allocation resulted in a GC in an ephemeral
segment and if so, suspends itself and allows the ephemeral GC to take
place (foreground GC). This means that although a full GC is taking
place, we still have the capability to get rid of dead short-lived
objects. Because the background GC allows for collections in generation 0
and 1, what happens if the threshold of the ephemeral segment is
reached? In this case, the foreground GC grows the ephemeral segment as
needed.
To summarize,
server GCs always block throughout the duration of a GC. To avoid lag
times during this suspension, the workstation GC was introduced, which
minimizes the time threads spend in the suspended state during a GC.
Although this approach works well with applications that have a
relatively small managed heap foot print, lag times can still be
observed if the ephemeral segment is exhausted. This deficiency led to
an evolution of the workstation GC called the background GC, which
allows for true concurrent allocations and ephemeral collections (and
segment expansion if needed).
Please note that in CLR 4.0, the background GC is only available in workstation mode.